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TITLE OF THE INVENTION 
IMAGE PROCESSING APPARATUS AND METHOD FOR EXTRACTING 
FEATURE OF OBJECT 

CROSS-REFERENCE TO RELATED APPLICATIONS 
5 This application is based upon and claims the 

benefit of priority from the prior Japanese Patent 
Applications No. 2000-074489, filed March 16, 2000; and 
No. 2000-347043, filed November 14, 2000, the entire 
contents of both of which are incorporated herein by 
10 reference. 

BACKGROUND OF THE INVENTION 
The present invention relates to an image 
processing apparatus and method, which are applied to 
entrance/exit management of an important facility, or 

15 access management of a computer (terminal device) and 

are suitably applied to a personal authentication 
apparatus for authenticating a person on the basis of 
vital information (biometrics) such as a facial image. 

In recent years, in view of application to a human 

2 0 interface, security, and the like, a personal 

authentication technique based on vital information 
(biometrics) such as a facial image, or the like is 
becoming increasingly important, and various techniques 
have been developed. 

2 5 As biometrics, fingerprints, palmprints, 

voiceprints, and signature verifications, retina and 
iris scans, and the like are put into practical use. 
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These schemes are categorized into "contact type" a 
person must directly touch a device, and "non-contact 
type" a person need not touch. 

Of the above examples of biometrics, fingerprints, 
5 palmprints, and signature verifications belong to 

"contact type"* However, the category of signature 
verifications is slightly different from that of 
fingerprint verification and the like since a signature 
verification requires a person to take an action, i.e., 

10 to sign. 

Biometric that similarly requires person's action 
is voiceprint (speech) verification, but it belongs to 
"non-contact type". Retina and iris scans belong to 
"non-contact type" since a camera captures an image of 

15 a desired portion and the captured image is processed. 

However, since a retina is located at the bottom of an 
eyeball and must be scanned while the eye nearly con- 
tacts a lens, its category is close to "contact type". 
Note that the iris is present on the surface of 

20 the eyeball, and can be scanned by a camera at a 

separate position. However, since a micropattern must 
be scanned, the maximum distance between the eye and 
camera is naturally limited. 

On the other hand, facial image verification as 

25 non-contact biometric has received a lot of attention 

recently, and various techniques using this have been 
developed. A facial pattern has a larger scale than 
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the aforementioned iris pattern * 

In a personal authentication apparatus that adopts 
facial image verification, one monitor camera is set at 
an appropriate position to capture a facial image of a 
5 full-face or nearly full-face pose, and the captured 

image undergoes pattern verification with facial image 
data registered in advance under similar conditions . 

Also, a system in which a camera is set at a 
relatively lower position to capture a looked-up facial 

10 image and to register and verify it has been proposed* 

For example, Jpn. Pat. Appln. KOKAI Publication 
No. 11-196398 discloses a technique that pertains to an 
image processing apparatus in which video cameras are 
laid out to make a given angle, a facial image of a 

15 full-face pose is stored in a full-face template memory, 

a facial image of a left-half-face pose is stored in a 
half -face template memory, and the correlation between 
images output from the video cameras and data stored 
in the memories is computed to determine the facial 

20 motion of a person (to be referred to as prior art 1 

hereinafter) . 

Furthermore, Fukui and Yamaguchi, "Facial Feature 
Point Extraction by Combining Shape Extraction and 
Pattern Verification", Journal of IEICE (D-II), 

25 Vol. j-80-D-II, No. 8, August 1997, proposes a method 

of extracting facial feature points of a pupil, nasal 
cavity, mouth edge, and the like from a moving image at 
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high speed and high precision for facial recognition 
(to be referred to as prior art 2 hereinafter). 

Also, Yamaguchi, Fukui, and Maeda, "Facial 
Recognition System Using Moving Image", IEICE 
5 Transactions PRMU97-50, June 1997, proposes a personal 

identification method for facial recognition using a 
moving image (time-series images) in place of a single 
image (to be referred to as prior art 3 hereinafter). 
BRIEF SUMMARY OF THE INVENTION 

10 (1) An image processing apparatus of the present 

invention comprises a plurality of image sensing means 
for sensing object images from different directions, 
normalization means for extracting feature points from 
the object images sensed by the plurality of image 

15 sensing means, setting a feature region on the basis of 

the extracted feature points, segmenting the set 
feature region into a plurality of regions, computing 
an average value of brightness levels in each segmented 
region, and computing a feature pattern on the basis of 

20 the computed average value, registration means for 

registering the feature pattern computed by the 
normalization means as a feature pattern associated 
with a predetermined object, and verification means for 
specifying an object associated with the object image 

25 by comparing the feature pattern computed by the 

normalization means with the feature pattern registered 
in the registration means. 
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(2) An image processing apparatus of the present 
invention comprises image input means for sensing an 
object image from different positions, and inputting a 
plurality of object images at different image sensing 

5 positions, feature extraction means for extracting 

feature patterns that represent features of an object 
from the plurality of object images input by the image 
input means, verification means for verifying the 
plurality of feature patterns extracted by the feature 

10 extraction means with a reference feature pattern which 

is registered in advance, and discrimination means for, 
when at least one of the plurality of feature patterns 
extracted by the feature extraction means matches the 
reference feature pattern which is registered in 

15 advance as a result of verification of the verification 

means, determining that an object associated with that 
object image is a person himself or herself. 

(3) An image processing apparatus of the present 
invention comprises image input means for sensing an 

20 object image from different positions, and inputting a 

plurality of object images at different image sensing 
positions, input image determination means for 
determining an image sensing position of an object 
image to be used from the plurality of object images 

25 input by the image input means upon registration of a 

feature pattern, first feature extraction means for 
extracting a feature pattern which represents a feature 
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of an object from the object image determined by the 
input image determination means, registration means for 
registering the feature pattern extracted by the first 
feature extraction means as a reference feature pattern 
5 associated with the object in correspondence with 

position information indicating the image sensing 
position of the corresponding object image, 
verification image selection means for selecting an 
object image at an image sensing position, which 

10 corresponds to the position information registered 

together with the feature pattern of the object to be 
verified registered in the registration means, of the 
plurality of object images input by the image input 
means upon verification of a feature pattern, second 

15 feature extraction means for extracting a feature 

pattern which represents a feature of the object from 
the object image selected by the verification image 
selection means, and verification means for specifying 
an object associated with the object image by verifying 

20 the feature pattern extracted by the second feature 

extraction means with the feature pattern of the object 
to be verified registered in the registration means. 

(4) An image processing method of the present 
invention comprises the first step of sensing object 

25 images from different directions, the second step of 

extracting feature points from the object images sensed 
in the first step, setting a feature region on the 
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basis of the extracted feature points, segmenting the 
set feature region into a plurality of regions, 
computing an average value of brightness levels in each 
segmented region, and computing a feature pattern on 
5 the basis of the computed average value, the third step 

of registering the feature pattern computed in the 
second step as a feature pattern associated with a 
predetermined object, and the fourth step of specifying 
an object associated with the object image by comparing 

10 the feature pattern computed in the second step with 

the feature pattern registered in the third step* 

(5) An image processing method of the present 
invention comprises the first step of sensing an object 
image from different positions, and inputting a 

15 plurality of object images at different image sensing 

positions, the second step of extracting feature 
patterns that represent features of an object from the 
plurality of object images input in the first step, the 
third step of verifying the plurality of feature 

20 patterns extracted in the second step with a reference 

feature pattern which is registered in advance, and the 
fourth step of determining, when at least one of the 
plurality of feature patterns extracted in the second 
step matches the reference feature pattern which is 

25 registered in advance as a result of verification of 

the third step, that an object associated with that 
object image is a person himself or herself. 
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(6) An image processing method of the present 
invention comprises the first step of sensing an object 
image from different positions, and inputting a 
plurality of object images at different image sensing 
5 positions, the second step of determining an image 

sensing position of an object image to be used from the 
plurality of object images input in the first step upon 
registration of a feature pattern, the third step of 
extracting a feature pattern which represents a feature 

10 of an object from the object image determined in the 

second step, the fourth step of registering the feature 
pattern extracted in the third step as a reference 
feature pattern associated with the object in 
correspondence with position information indicating the 

15 image sensing position of the corresponding object 

image, the fifth step of selecting an object image at 
an image sensing position, which corresponds to the 
position information registered together with the 
feature pattern of the object to be verified registered 

20 in the fourth step, of the plurality of object images 

input in the first step upon verification of a feature 
pattern, the sixth step of extracting a feature pattern 
which represents a feature of the object from the 
object image selected in the fifth step, and the 

25 seventh step of specifying an object associated with 

the object image by verifying the feature pattern 
extracted in the sixth step with the feature pattern of 
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the object to be verified registered in the fourth step. 

Additional objects and advantages of the invention 
will be set forth in the description which follows, and 
in part will be obvious from the description , or may be 
5 learned by practice of the invention. The objects and 

advantages of the invention may be realized and 
obtained by means of the instrumentalities and 
combinations particularly pointed out hereinafter. 

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWING 
10 The accompanying drawings, which are incorporated 

in and constitute a part of the specification, 
illustrate presently preferred embodiments of the 
invention, and together with the general description 
given above and the detailed description of the 
15 preferred embodiments given below, serve to explain the 

principles of the invention. 

FIG. 1 is a schematic block diagram showing the 
arrangement of an image processing apparatus according 
to the first embodiment of the present invention; 
20 FIG. 2 shows an example of a camera layout in the 

first embodiment; 

FIG. 3 shows examples of facial images captured by 
respective cameras in the first embodiment; 

FIG. 4 is a flow chart for explaining facial image 
25 registration and verification processes by a processor 

according to the first embodiment; 

FIG. 5 shows examples of feature point extraction 
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results in the first embodiment; 

FIG. 6 is a view for explaining a normalization 
process in the first embodiment; 

FIG. 7 is a view for explaining a normalization 
5 process in the first embodiment; 

FIG. 8 is a view for explaining a scheme for 
computing a feature pattern in a vector format in the 
first embodiment; 

FIG* 9 shows an example of a camera layout in an 
10 image processing apparatus according to the second 

embodiment of the present invention; 

FIG. 10 shows an example of extracted facial 
images and feature points in the second embodiment; 

FIG. 11 is a view for explaining a normalization 
15 process when three feature points are obtained in the 

second embodiment; 

FIG. 12 is a view for explaining a normalization 
process when three feature points are obtained in the 
second embodiment; 
20 FIG* 13 shows an example of a camera layout in an 

image processing apparatus according to the third 
embodiment of the present invention; 

FIG. 14 shows an example of a camera layout in an 
image processing apparatus according to the sixth 
25 embodiment of the present invention; 

fig. 15 is a chart for explaining a registration 
process according to the seventh embodiment of the 
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present: invention; 

FIG* 16 is a chart for explaining a verification 
process according to the seventh embodiment; 

FIG* 17 shows an example of a camera layout in an 
5 image processing apparatus according to the eighth 

embodiment of the present invention; 

FIG* 18 is a schematic block diagram showing the 
arrangement of the image processing apparatus according 
to the eighth embodiment; and 
10 FIG. 19 is a flow chart for explaining a 

verification computation process by a processor 
according to the eighth embodiment. 

DETAILED DESCRIPTION OF THE INVENTION 
Preferred embodiments of the present invention 
15 will be described hereinafter with reference to the 

accompanying drawings. 

The first embodiment of the present invention will 
be described below. 

FIG. 1 shows the arrangement of an image 
20 processing apparatus according to the first embodiment. 

Referring to FIG. 1, a plurality of video cameras (to 
be simply referred to as cameras hereinafter) 1-1, 
1-2,..., 1-N as image sensing means for sensing an 
object image line up vertically to have a given spacing. 
25 This embodiment uses video cameras that can sense 

monochrome images. The outputs from the cameras 1-1 , 
1-2,..*, 1-N are respectively connected to capture 
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boards 2-1, 2-2,..., 2-N. 

That is, a video signal (analog data) 1 from the 
camera 1-1 is converted into digital data by an A/D 
converter 2- la, and the digital data is temporarily 
5 stored in an image memory 2 -lb. The capture boards 

2-2,..., 2-N similarly comprise A/D converters and 
image memories (not shown for the sake of simplicity), 
and video signals 2 to N undergo similar processes in 
the capture boards 2-2,..., 2-N. 

10 when each of the cameras 1-1, 1-2, — , 1-N 

comprises a USB (Universal Serial Bus) interface, a USB 
interface is added in place of the A/D converter 2-la. 
The same applies to a case wherein the camera comprises 
a digital I/O interface such as IEEE1394 other than USB. 

15 The capture boards 2-1, 2-2,..., 2-N, a processor 

4, a work memory 5, a display board 6, and a dictionary 
7 are connected to each other via a system bus 3 so as 
to be able to communicate with each other. Furthermore, 
a display 8 is connected to the display board 6. The 

20 dictionary 7 is a registration means in which a 

plurality of dictionary data (reference feature 
patterns) are registered (stored). The display 8 
outputs, e.g., a verification result. 

The processor 4 sends a control signal for taking 

25 synchronization to the cameras 1-1, 1-2,..., 1-N, 

receives digital data that pertain to facial images 
sent from the cameras 1-1, 1-2,..., 1-N, and executes 
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facial image registration, verification, and 
determination processes (to be described later) of the 
received data using the work memory 5 and dictionary 7 . 

Note that processors exclusively used to process 
5 images sensed by the cameras 1-1, 1-2,..., 1-N may be 

parallelly arranged to achieve high-speed processing. 

FIG. 2 shows details of an example of the layout 
of the cameras 1-1, 1-2,..., 1-N, and FIG. 3 show 
examples of facial images sensed by the cameras 1-1, 
10 1-2,..., 1-N. 

In this example, three cameras 1-1, 1-2, and 1-3 
line up vertically to have an appropriate spacing, and 
an object (face) F is located in front of these cameras 
1-1, 1-2, and 1-3. With this layout, the cameras 1-1, 
15 1-2, and 1-3 respectively sense and capture a 

looked-down facial image (see (a) of FIG. 3), a frontal 
facial image (see (b) of FIG. 3), and a looked-up 
facial image (see (c) of FIG. 3). 

Note that the cameras 1-1, 1-2, and 1-3 are 
20 illustrated as independent input devices. Alter- 

natively, the cameras 1-1, 1-2, and 1-3 may be stored 
in a single housing, so that they appear as a single 
input device for the user. 

The facial image registration and verification 
25 processes by the processor 4 of the image processing 

apparatus of the first embodiment will be described in 
detail below with reference to the flow chart shown in 
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FIG. 4. 

Note that the flow chart in FIG. 4 shows processes 
to be individually done for facial images captured by 
the cameras 1-1, 1-2 , and 1-3. 
5 The cameras 1-1, 1-2, and 1-3 sense an object 

image from different directions (SO). The processor 4 
seeks a facial image region from the entire input image 
(SI). Pupil and nasal cavity regions, which are 
regarded as substantially circular regions, are 

10 detected, and the central positions of these regions 

are detected as feature points of the facial image (S2). 
Note that the processes in steps SI and S2 can adopt 
the conventional method described in, e.g., prior art 2. 
The detection results of the feature points are as 

15 shown in (a), (b), and (c) of FIG. 5, and "x" marks in 

these figures indicate the detected feature points. 
Note that (a) of FIG. 5 corresponds to the facial image 
captured by the camera 1-1, (b) of FIG. 5 corresponds 
to the facial image captured by the camera 1-2, and (c) 

20 of FIG. 5 corresponds to the facial image captured by 

the camera 1-3. 

Then, a normalization process including extraction 
of a feature region (S3) and segmentation of the region 
(S4) is executed. The normalization process in steps 

25 S3 and S4 is a core of the facial image registration 

and verification processes in this embodiment. The 
normalization process will be described in detail below 
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with reference to FIGS. 6 and 7. 

As shown in FIGS. 6 and 7, the normalization 
process takes different procedures depending on the 
number of obtained feature points (two or four points). 
5 If the number of obtained feature points is other than 

2 or 4, the normalization process is skipped. 

A feature region is defined as a rectangle 
obtained by enlarging a rectangle specified by a 
plurality of feature points at a magnification set for 

10 each camera. More specif ically, when two feature 

points are obtained, as shown in f e.g., (a) of FIG. 6, 
a rectangular region (pi, p2, p3, p4) obtained by 
further enlarging a rectangle formed by giving a 
predetermined width to a line segment flf2 at a 

15 predetermined magnification is defined as a feature 

region, as shown in (b) of FIG. 6. In this example, 
the obtained feature region is broken up into 10 x 5 
small rectangular regions, as shown in (c) of FIG. 6. 
On the other hand, when four feature points are 

20 obtained, as shown in, e.g., (a) of FIG. 7, a 

rectangular region (ql, q2, q3, q4) obtained by further 
enlarging a rectangle formed by giving a predetermined 
width to line segments flf2 and f3f4 at a predetermined 
magnification is defined as a feature region, as shown 

25 in (b) of FIG. 7. In this example, the obtained 

feature region is broken up into 15 x 15 small 
rectangular regions, as shown in (c) of FIG. 7. 
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In this embodiment , a rectangular feature region 
is used. However, the present invention is not limited 
to this as long as a region can be uniquely computed 
from two or four feature points. On the other hand, 
5 the feature region is broken up by equally segmenting 

the respective sides of the rectangle. Alternatively, 
the segmented small areas may have variable sizes, i.e., 
areas, in consideration of feature strength. The 
number of equally segmented areas is not limited to the 

10 aforementioned values. As a characteristic feature of 

the image processing apparatus of this embodiment, 
parameters of the normalization process can be 
optimized in units of cameras. 

The description will revert to FIG. 4. A feature 

15 pattern for registration and verification is computed 

(extracted) (S5). In this process, the average values 
of brightness levels are computed in units of partial 
regions broken up in step S4, and are arranged in an 
order (raster scan order) shown in FIG. 8 to express 

20 facial feature pattern data (to be simply referred to 
as a feature vector hereinafter): 
(Vll, V12, , Vnm-1, Vnm) 

As a modification of the process in step S5, 
pixels in the feature region undergo density correction 
25 prior to the average value computation to normalize the 

density range. Alternatively, a feature emphasis 
filter process such as a differential process may be 
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executed. Furthermore, other density or graphic 
feature amounts such as most frequent values (mode 
values), principal directions of edges (extracted by 
the differential process), and the like may be used in 
5 place of the average values as feature amounts in 

segmented small areas. 

It is then checked if registration or verification 
is made (S6). If registration is to be made, a 
plurality of feature patterns are stored (S7, S8), 

10 dictionary data are generated using the scheme (partial 

space method) of prior art 2 (S9), and a procedure for 
registering dictionary data in the dictionary 7 
together with personal attribute information is done 
(S10), thus ending registration. 

15 On the other hand, if verification is to be made, 

a verification computation process using the scheme 
(partial space method) of prior art 2 is done (Sll), 
and a verification level output process is then 
executed (S12). 

20 As described above, according to the first 

embodiment, a single input facial image (still image) 
is to be processed. In consideration of variations of 
the input conditions, a plurality of successive facial 
images (moving image) may be captured at a given time 

25 interval, and may be buffered on the image memory 2 -lb 

of the capture board 2-1 shown in FIG. 1. The method 
of making verification computation using a moving image 
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(time-series images) is described in detail in prior 
art 3 (mutual partial space method) . 

in the flow upon processing a moving image, the 
processes in steps SI to S4 can be the same as those in 
5 the flow chart in FIG. 4, and the contents of only 

steps S9 and S10 need only be changed. That is, after 
a predetermined number of feature patterns are stored 
as in registration, a statistical process such as main 
component analysis or the like is done based on these 

10 plurality of feature vectors, and a pattern 

verification process is then executed. 

A characteristic feature of the first embodiment 
mentioned above lies in integrated control of processes 
of facial images from the cameras 1-1, 1-2, and 1-3. 

15 To summarize, the processor 4 executes the following 

control . 

First, the cameras 1-1 and 1-2 detect the central 
positions of the eyes as two (right and left) feature 
points, and the camera 1-3 detects the two central 

20 points of the nasal cavities, thus obtaining a total of 

four feature points . 

If this condition is not satisfied, and at least 
one of these cameras fails to detect an image, it is 
determined that the process at that time is not 

25 successful, and the process is interrupted immediately. 

Then, the process is retried or an error message is 
displayed for the user. 
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Second, upon registration, feature vectors 
corresponding to facial images from all the cameras 1-1, 
1-2, and 1-3 are registered as a set of person data. 
Third, if verification levels between facial 
5 images captured by all the cameras 1-1, 1-2, and 1-3 

and registered data of a given person are output as Ml, 
M2, and M3, it is determined that "that person is 
identified" only when a condition: 

Ml >= Kl, and M2 >= K2 , and M3 >= K3 
10 where Kl, K2, and K3 are predetermined threshold 

values is satisfied. 

Upon searching for a person corresponding to the 
input facial image, of persons whose data satisfy the 
above condition (if no person data satisfies the above 
15 condition, "no match" is determined), a person 

corresponding to a maximum average value of 
verification levels, i.e., 

(Ml + M2 + M3)/3 
is finally determined to be a match. 
20 The second embodiment of the present invention 

will be described below. 

The basic arrangement of an image processing 
apparatus according to the second embodiment is 
substantially the same as that in FIG. 1 (first 
25 embodiment), except that the cameras 1-1, 1-2, and 1-3 

line up horizontally. 

The layout of these cameras 1-1, 1-2, and 1-3 is 
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as shown in FIG. 9. That is, as shown in FIG. 9, the 
cameras 1-1 f 1-2, and 1-3 line up horizontally at an 
appropriate spacing so as to be angled slightly upward. 
With this layout, the cameras 1-1, 1-2, and 1-3 
5 respectively sense and capture a facial image of a left 

half -face pose, a looked-up facial image of a full-face 
pose, and a facial image of a right half -face pose. 

Examples of facial images input from the cameras 
1-1, 1-2, and 1-3 and feature points extracted 

10 therefrom are as shown in (a), (b), and (c) of FIG. 10, 

and "x" marks in these figures indicate the detected 
feature points. Note that (a) of FIG. 10 corresponds 
to the camera 1-1, (b) of FIG. 10 corresponds to the 
camera 1-2, and (c) of FIG. 10 corresponds to the 

15 camera 1-3. 

The process executed by the processor 4 is 
substantially the same as that in the first embodiment, 
except that the process executed upon obtaining two 
feature points in the first embodiment is replaced by 

20 that executed upon obtaining three feature points, as 

will be described in detail below. 

Examples of the normalization process upon 
obtaining three feature points will be explained below 
with reference to FIGS. 11 and 12. 

25 More specifically, when three feature points are 

obtained from the right half -face facial image, as 
shown in (a) of FIG. 11, a rectangular region (pi, p2, 
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p3, p4) based on a triangle (fl, f2, f3) is defined to 
be a feature region , as shown in (b) of FIG. 11. in 
this example, this feature region is broken up into 
10 x 15 small rectangular regions, as shown in (c) of 

5 FIG- 11. 

Likewise, when three feature points are obtained 
from the left half -face facial image, as shown in (a) 
of FIG- 12, a rectangular region (ql, q2, q3, q4) based 
on a triangle (fl, f2, f3) is defined to be a feature 

10 region, as shown in (b) of FIG. 12. In this example, 

this feature region is broken up into 10 x 15 small 
rectangular regions, as shown in (c) of FIG. 12. 

In order to obtain a rectangular region from three 
feature points, a line which is parallel to a line 

15 segment flf2 and passes through f3 is drawn, two points 

separated a given width from f3 are set on the line on 
the two sides of f3, a rectangle having as vertices a 
total of four points including these two points, and fl 
and f2 can be enlarged at a predetermined magnification. 

20 At this time, the magnification is set in advance for 

each camera. However, the present invention is not 
limited to this method, and a feature region need only 
be defined in advance so that it can be uniquely 
determined . 

25 The third embodiment of the present invention will 

be described below. 

The basic arrangement of an image processing 
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apparatus according to the third embodiment is 
substantially the same as that in FIG. 1 (first 
embodiment), except that nine cameras 1-1, 1-2, — , 1-9 
are set two-dimensionally . 

The layout of these cameras 1-1, 1-2,..., 1-9 is 
as shown in FIG. 13. That is, as shown in FIG. 13, the 
cameras 1-1, 1-2, and 1-3 respectively capture slightly 
looked-down facial images of a left half -face pose, 
full-face pose, and right half -face pose. The cameras 
1-4, 1-5, and 1-6 respectively capture facial images of 
a left half -face pose, full-face pose, and right 
half -face pose. Furthermore, the cameras 1-7, 1-8, and 
1-9 respectively capture slightly looked-up facial 
images of a left half -face pose, full-face pose, and 
right half -face pose. 

Note that the process executed by the processor 4 
according to the third embodiment can be implemented by 
combining the processes of the first and second 
embodiments described above. 

The fourth embodiment of the present invention 
will be described below. 

The basic arrangement of an image processing 
apparatus according to the fourth embodiment is 
substantially the same as that in FIG. 1 (first 
embodiment). Also, the camera layout is the same as 
that in one of the first to third embodiments. 

The process executed by the processor 4 according 
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to the fourth embodiment can be realized by adding a 
feature vector extraction process to that explained in 
the first to third embodiments. That is, an 
integration process of feature vectors extracted from a 
5 plurality of facial images is added. 

More specifically, when a plurality of cameras 1-1, 
1-2,..., 1-n are deployed, if feature vectors: 
camera 1-1: (Vll,..., V1L1) (LI dimension) 
camera 1-2: (V21,..., V2L2 ) (L2 dimension) 

10 

camera 1-n: (Vnl, , VnLn) (Ln dimension) 

are computed from the outputs of the cameras 1-1, 
1-2,..., 1-n, these feature vectors are integrated by 
coupling in turn: 
15 (Vll,..., V1L1, V21,.., V2L2,..., Vnl,..., VnLn) 

( L1+L2+ . . . +Ln ) dimension 

As a result of such feature vector integration, 
the same registration and verification processes 
(partial space method) as in a single camera described 
20 in, e.g., prior art 2, can be made although a plurality 

of cameras are used. 

The fifth embodiment of the present invention will 
be described below* 

The basic arrangement of an image processing 
25 apparatus according to the fifth embodiment is 

substantially the same as that in FIG. 1 (first 
embodiment). Also, the camera layout is the same as 
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that in one of the first to third embodiments. 

The process executed by the processor 4 according 
to the fifth embodiment can be realized by adding a 
feature vector extraction process to that explained in 
the first to third embodiments. Note that input facial 
images are time-serially captured at given time 
intervals, and a process to be added for the purpose of 
improving the verification precision is an integration 
process of feature vectors on the time series, as will 
be described below. 

More specifically, when feature vectors at times 1, 
2,..., s (assume that feature patterns from a plurality 
of cameras have already been integrated by the method 
of the fourth embodiment, and L represents that total 
dimension) are computed, the input feature vectors 

time 1: (Vll, . . . , V1L) 

time 2: (V21, . . . , V2L) 

time s: (Vsl, , VsL) 

are integrated on the time series to obtain an average 
vector described by: 



when a moving image is used. 

As a result of integration on the time series, the 
method of prior art 2 can be used as in case of a 
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single camera* The method of prior art 3 may be 
applied without executing such process for obtaining 
the average feature vector. 

As described above, according to the first to 
fifth embodiments, a plurality of cameras are set 
within a range in which they satisfy a given condition 
so as to simultaneously capture facial images of an 
identical person, and a correction process is done on 
the basis of a total of four pieces of feature point 
position information of pupils and nasal cavities, thus 
improving the verification precision while maintaining 
the load on the user light as in the prior art* 

When feature patterns extracted from a plurality 
of facial images are integrated to one to execute the 
same pattern verification process as in the prior art, 
increases in complicated computation and processing 
cost can be minimized. 

The sixth embodiment of the present invention will 
be described below. 

The basic arrangement of an image processing 
apparatus according to the sixth embodiment is 
substantially the same as that in FIG. 1 (first 
embodiment ) , except for the layout and use method of 
cameras 1-1, 1-2,..., 1-N, and a corresponding 
integration process. 

FIG. 14 shows an example wherein two cameras 1-1 
and 1-2 are laid out vertically. As shown in FIG. 14, 
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the cameras 1-1 and 1-2 line up vertically at an 
appropriate spacing to be angled slightly upward* An 
object F is located in front of these cameras 1-1 and 
1-2. With this layout, the cameras 1-1 and 1-2 sense 
5 slightly looked-up facial images of a full-face pose. 

In FIG. 14, reference numeral 9-1 denotes an 
illumination lamp arranged above the camera 1-1; and 
9-2, an illumination lamp arranged below the camera 1-2. 
in the example shown in FIG. 14 f the camera 1-1 is 

10 set at a level that assumes the average height of an 

adult, and the camera 1-2 is set at a level that 
assumes the height of an adult equal to or lower than 
the average height or a child. Even when a very tall 
person falls outside the field of view of the camera, 

15 his or her facial image can be input when he or she 

bends down. An infant may be excluded from the process, 
or his or her facial image may be input using a stool 
or while being lifted in his or her parent's arms. 
Also, the camera 1-2 can be used for a handicapped 

20 person in a wheelchair. 

The integration process is executed by the 
processor 4 according to the sixth embodiment as 
follows. That is, in the first embodiment, when 
feature point detection based on at least one of facial 

25 images from a plurality of cameras fails, it is 

determined that the process at that time is not 
successful, and the process is interrupted immediately. 
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Then, the process is retried or an error message is 
displayed for the user. In the sixth embodiment , 
verification is done for a facial image from which 
feature points can be successfully detected, and if 
5 verification of at least one facial image is successful, 

it is determined that "the person is identified". 

More specifically, if verification levels between 
facial images captured by the cameras 1-1 and 1-2, and 
registered data (dictionary data) of a given person are 
10 output as Ml and M2 , it is determined that "that person 

is identified" when a condition: 

Ml >== Kl or M2 >== K2 

where Kl and K2 are predetermined threshold values 
is satisfied. 

15 With this process, flexible verification can be 

done in correspondence with individual height 
differences* 

In the sixth embodiment, two cameras are used. 
However, three or more cameras may be used. In 

20 addition, only when verification of a predetermined 

number or more of facial images is successful in place 
of that of one of facial images, it is determined that 
"the person is identified". That is, if verification 
levels between facial images captured by the cameras 

25 1-1, 1-2,..., 1-N and registered data (dictionary data) 

of a given person are output as Ml, M2,..., Mn, it is 
determined that "that person is identified" when a 
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condition : 

R >= T (T > 2) 

where R is the number of times Mi >+ Ki {i = 1, 
2,..., n) holds, and Ki and T are predetermined 
5 threshold values 

is satisfied. 

The seventh embodiment of the present invention 
will be described below. 

In the seventh embodiment, a camera used to obtain 

10 a facial image is determined in the registration and 

verification processes by the processor 4 in the sixth 
embodiment. That is, upon registration, a camera used 
to obtain a facial image is determined, and a facial 
image is registered together with the camera number. 

15 More specifically, as shown in FIG. 15, a registrant 

determines one of the cameras 1-1 and 1-2 which he or 
she would like to use depending on his or her physical 
construction and height, and selects that camera (S41). 
Also, the registrant inputs an ID code via a keyboard, 

20 card, or the like as his or her own identification 

information. 

A facial image input by the selected camera 
undergoes the aforementioned image process to extract a 
feature pattern (SI to S5), and the extracted feature 

25 pattern is registered in the dictionary 7 together with 

the input ID code of the registrant, and the camera 
number as the position information of the selected 
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camera (i.e., position information indicating the image 
sensing position of an object image) (S7 to S10). 

Note that the camera to be used may be determined 
by the registrant himself or herself, but a camera that 
5 can obtain a facial image from which feature points 

corresponding to two pupils and two nasal cavities are 
extracted, and the central portion of which is closest 
to these positions may be automatically determined. In 
this example, information indicating a camera to be 

10 used (camera number as camera position information) can 

also be used as personal information, thus setting 
higher security level. 

Upon verification, a verification process is done 
for only a facial image obtained from a camera 

15 corresponding to the registered camera number. More 

specifically, as shown in, e.g., FIG. 16, a person to 
be verified inputs his or her own ID code via a 
keyboard, card, or the like, and the dictionary 7 is 
searched for the camera number and feature pattern 

2 0 registered in correspondence with that ID code (S42). 

A camera corresponding to the camera number 
retrieved from the dictionary 7 is selected (S43), and 
a facial image input by the selected camera undergoes 
the aforementioned image process to extract a feature 

25 pattern (SI to S5). The extracted feature pattern is 

verified with the feature pattern retrieved from the 
dictionary 7 to determine if the person to be verified 
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is the person himself or herself (Sll, S12). 

In the seventh embodiment, two cameras are used. 
Alternatively, three or more cameras may be used. In 
addition, only one camera is selected, but when three 
5 or more cameras are used, two or more cameras may be 

selected. 

The eighth embodiment of the present invention 
will be described below. 

In the first to seventh embodiments, a required 

10 number of cameras are set in advance. However, in the 

eighth embodiment, a camera moving mechanism is 
arranged, and moves a single camera to a predetermined 
position so as to execute the aforementioned process. 
In this case, an image capture time is slightly 

15 prolonged, but only one camera is required, resulting 

in simple setting. 

FIG. 17 shows an example wherein one camera 1 is 
set. More specifically, as shown in FIG. 17, the 
camera 1 is fixed to a vertical guide rail 10 to be 

20 slidable vertically, and is moved by a camera moving 

mechanism (not shown) to a first position PI 
(corresponding to the position of the camera 1-1 in 
FIG. 14) or a second position P2 (corresponding to the 
position of the camera 1-2 in FIG. 14). The camera 1 

25 is set to be angled slightly upward. 

The basic arrangement of an image processing 
apparatus according to the eighth embodiment is 



substantially the same as that in FIG. 1 (first 
embodiment), except that the number of sets of cameras 
and capture boards is reduced to one, and a camera 
position controller 12 for controlling a camera moving 
mechanism 11 that moves the camera 1, and a 
communication board 13 for connecting the camera 
position controller 12 to the system bus 3 are added, 
as shown in FIG. 18. 

The process executed by the processor 4 according 
to the eighth embodiment is basically the same as that 
in FIG. 4 (first embodiment), except for the contents 
of the verification computation process in step Sll. 
The verification computation process in step 11 will be 
described in detail below with reference to the flow 
chart shown in FIG. 19. 

The operation varies depending on whether 1 : 1 or 
1 : N verification is done (S21). Note that 1 : 1 
verification is a verification process executed when a 
person to be verified inputs his or her own ID code to 
designate registered data to be verified, and 1 : N 
verification is a verification process executed when a 
person to be verified does not input any ID code to 
select all registered data as data to be verified. 

If 1 : 1 verification is made, the ID code input 
by the person to be verified is read (S22). Registered 
data (partial space) corresponding to the read ID code 
is read from the dictionary 7 (S23). A verification 
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level between the spatial space of the registered data 
and an input partial space is computed to verify using , 
e.g., the partial space method (S24). The verification 
level is compared with a predetermined threshold value 
(S25, S26) to output a 1 : 1 verification result (S27). 

If 1 : N verification is made, all registered data 
to be verified are read from the dictionary 7 (S28). 
Verification levels with these registered data are 
computed (S29). A maximum one of the computed 
verification levels is selected (S30) and is output as 
a 1 : N verification result (S32). 

In this case, if the maximum verification level 
may undergo a threshold value process like in step S31 
bounded by the broken line in FIG. 19, it can be 
checked if the verification result is correct (S33). 
For example, if the verification level is too low, it 
is determined that no match is found. 

Note that 1 : N verification is a verification 
process executed when a person to be verified does not 
input any ID code to select all registered data as data 
to be verified, and corresponds to the verification 
process in the first to sixth embodiments described 
above . 

In the eighth embodiment, only one camera is used, 
but two or more cameras may be used. Furthermore, only 
one camera is moved. However, when two or more cameras 
are used, these cameras may be moved. 
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The embodiments of the present invention have been 
explained, but the present invention is not limited to 
these specific embodiments and various changes may be 
made without departing from the scope of the invention. 
For example, the aforementioned embodiments may be 
appropriately combined. 

in the above embodiments, a camera of a normal 
visible wavelength range is used. Alternatively, an 
infrared ray camera of an infrared range may be used, 
or a rangefinder that can obtain distance information 
(depth information) may be used. Also, image input 
devices of different types may be mixed. 

As described above, according to the present 
invention, an image processing apparatus and method, 
which can reduce the load on the user upon registration 
and verification, can greatly improve verification 
performance, and can minimize increases in complicated 
computation and processing cost without largely 
changing a pattern verification process algorithm from 
a single direction can be provided. 

By contrast, the technique of prior art 1 
independently processes images although it uses a 
plurality of cameras, and a function is limited to 
detection of the face direction. In the techniques 
of prior arts 2 and 3, since the countenance of a 
person has three-dimensional shape information, 
two-dimensional facial image information captured from 
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only one direction alone limits personal identification 
performance, and practical verification performance 
cannot become so high compared to fingerprint 
verification, retina and iris scans, and the like, 
5 Additional advantages and modifications will 

readily occur to those skilled in the art. Therefore, 
the invention in its broader aspects is not limited to 
the specific details and representative embodiments 
shown and described herein. Accordingly, various 
10 modifications may be made without departing from the 

spirit or scope of the general inventive concept as 
defined by the appended claims and their equivalents. 



